对抗性深度学习是为了训练强大的DNN,以防止对抗性攻击,这是深度学习的主要研究之一。游戏理论已被用来回答有关对抗性深度学习的一些基本问题,例如具有最佳鲁棒性的分类器的存在以及给定类别的分类器的最佳对抗样本。在以前的大多数工作中,对抗性深度学习是同时进行的,并且假定策略空间是某些概率分布,以使NASH平衡存在。但是,此假设不适用于实际情况。在本文中,我们通过将对抗性深度学习作为顺序游戏提出,为分类器是具有给定结构的DNN的实际情况提供了这些基本问题的答案。证明了这些游戏的Stackelberg Equilibria的存在。此外,当使用Carlini-Wagner的边缘损失时,平衡DNN具有相同结构的所有DNN中最大的对抗精度。从游戏理论方面也研究了对抗性深度学习的鲁棒性和准确性之间的权衡。
translated by 谷歌翻译
在本文中,引入了偏置分类器,即,作为激活函数的Relu的DNN的偏置部分用作分类器。这项工作是推动偏置部分是具有零梯度的分段常量函数的事实,因此不能直接被基于梯度的方法攻击,以产生诸如FGSM的对手。偏置分类器的存在被证明了提出了一种有效的校准分类方法的训练方法。证明,通过向偏置分类器添加适当的随机第一度部分,在攻击产生对原始方向的意义上获得了针对原始模型梯度的攻击的信息理论上安全分类器。这似乎是第一次提出信息理论上安全分类器的概念。提出了几种用于偏置分类器的攻击方法,并且使用数值实验表明,在大多数情况下,偏置分类器比对这些攻击的DNN更鲁棒。
translated by 谷歌翻译
在本文中,提出了强大的分类 - 自动编码器(CAE),该分类具有强大的能力来识别异常值和捍卫对手。主要思想是将自动编码器从无监督的学习模型更改为分类器,在该模型中,编码器用于将具有不同标签的样品压缩为不同的相关压缩空间,并使用解码器从其压缩空间中恢复样品。编码器既将编码器用作压缩功能学习者和分类器,并且使用解码器来确定编码器给出的分类是否正确,通过将输入样本与输出进行比较。由于目前的DNN框架似乎是不可避免的,因此基于CAE引入了捍卫对手的列表分类器,该分类器是基于CAE的,该框架输出了多个标签和CAE恢复的相应样品。广泛的实验结果用于表明CAE通过找到几乎所有异常值来识别异常值,以识别异常值。列表分类器给出了几乎无损分类的意义,即输出列表包含几乎所有对手的正确标签,并且输出列表的大小相当小。
translated by 谷歌翻译
我们研究了称为“乐观速率”(Panchenko 2002; Srebro等,2010)的统一收敛概念,用于与高斯数据的线性回归。我们的精致分析避免了现有结果中的隐藏常量和对数因子,这已知在高维设置中至关重要,特别是用于了解插值学习。作为一个特殊情况,我们的分析恢复了Koehler等人的保证。(2021年),在良性过度的过度条件下,严格地表征了低规范内插器的人口风险。但是,我们的乐观速度绑定还分析了具有任意训练错误的预测因子。这使我们能够在随机设计下恢复脊和套索回归的一些经典统计保障,并有助于我们在过度参数化制度中获得精确了解近端器的过度风险。
translated by 谷歌翻译
我们考虑与高斯数据的高维线性回归中的插值学习,并在类高斯宽度方面证明了任意假设类别中的内插器的泛化误差。将通用绑定到欧几里德常规球恢复了Bartlett等人的一致性结果。(2020)对于最小规范内插器,并确认周等人的预测。(2020)在高斯数据的特殊情况下,对于近乎最小常态的内插器。我们通过将其应用于单位来证明所界限的一般性,从而获得最小L1-NORM Interpoolator(基础追踪)的新型一致性结果。我们的结果表明,基于规范的泛化界限如何解释并用于分析良性过度装备,至少在某些设置中。
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译